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Abstract 

We consider a switch operating under the MaxWeight scheduling algorithm, under any traffic 
pattern such that all the ports are loaded. This system is interesting to study since the queue 
lengths exhibit a multi-dimensional state-space collapse in the heavy-traffic regime. We use a 
Lyapunov-type drift technique to characterize the heavy-traffic behavior of the expectation of 
the sum queue lengths in steady-state, under the assumption that all ports are saturated and 
all queues receive non-zero traffic. Under these conditions, we show that the heavy-trafhc scaled 
queue length is given by where a is the vector of the standard deviations of 

arrivals to each port in the heavy-traffic limit. In the special case of uniform Bernoulli arrivals, 
the corresponding formula is given by {n — ^ + ^)- The result shows that the heavy-traffic 
scaled queue length has optimal scaling with respect to n, thus settling one version of an open 
conjecture; in fact, it is shown that the heavy-traffic queue length is at most within a factor 
of two from the optimal. We then consider certain asymptotic regimes where the load of the 
system scales simultaneously with the number of ports. We show that the MaxWeight algorithm 
has optimal queue length scaling behavior provided that the arrival rate approaches capacity 
sufhciently fast. 


1 Introduction 

Consider a collection of queues arranged in the form of an n x n matrix. The queues are assumed 
to operate in discrete-time and jobs arriving to the queues will be called packets. The following 
constraints are imposed on the service process of the queueing system: (a) at most one queue can 
be served in each time slot in each row of the matrix, (b) at most one queue can be served in each 
time slot in each column of the matrix, and (c) when a queue is served, at most one packet can be 
removed from the queue. Such a queueing system is called a switch. 

A scheduling algorithm for the switch is a rule which selects the queues to be served in each time 
slot. A well-known algorithm called the MaxWeight algorithm is known to optimize the throughput 
in a switch. The algorithm was derived in a more general context in [T] and for the special context of 
the switch considered in here in [2] , where it was also shown that other seemingly good policies are 
not throughput-optimal. An important open question that is not fully understood is whether the 
MaxWeight algorithm is also queue length or delay optimal in any sense. In [3], it was shown that 
the MaxWeight algorithm minimizes the sum of the squares of the queue lengths in heavy-traffic 
under a condition called Complete Resource Pooling (CRP). For the switch, the CRP condition 
means that the arriving traffic saturates at most one column or one row of the switch. The result 
relies on the fact that, under CRP and in the heavy-traffic regime, there is a one-dimensional 
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state-space collapse, i.e., the state of the system collapses to a line. When the CRP condition is 
not met, the state-space collapses to a lower-dimension, but is not one-dimensional. State-space 
collapse without the CRP condition was established in [3] when the arrivals are deterministic. For 
stochastic arrivals, state-space collapse for the fluid limit was studied in [5], and a diffusion limit 
has been established in [6]. However, a characterization of the steady-state behavior of the diffusion 
limit was still open. 

In this paper, we use the Lyapunov-type drift technique introduced in [7]. The basic idea is to 
set the drift of an appropriately chosen function equal to zero in steady-state to obtain both upper 
and lower bounds on quantities of interest, such as the moments of the queue lengths. To obtain 
upper bounds one has to establish state-space collapse in a sense that is somewhat different than 
the one in [3]: the main difference being that the state-space collapse is expressed in terms of the 
moments of the queue lengths in steady-state. This form of state-space collapse can then be readily 
used in the drift condition to obtain the upper bound. However, in [7], the usefulness of the drift 
technique was only established under the CRP condition. In this paper, we consider the switch 
with uniform traffic, i.e., where the arrival rates to all queues are equal. Thus, in the heavy-traffic 
regime, when the traffic in one column (or row) approaches its capacity, the traffic in all rows and 
columns approach capacity, and the CRP condition is violated. The main contribution of the paper 
is to characterize the expected steady-state queue lengths in heavy-traffic even though the CRP 
condition is violated. As mentioned earlier, when the CRP condition is violated, the state does not 
typically collapse to a single dimension. The main challenge in our proof is due to the difficulty in 
characterizing the behavior of the queue length process under such a multi-dimensional state-space 
collapse. Characterizing the behavior of the queue lengths under multi-dimensional state-space 
collapse has been difficult, in general, except in rare cases; see BM for two such examples in other 
contexts. 

The difficulty in understanding the steady-state queue length behavior of the MaxWeight algo¬ 
rithm has meant that it is unknown whether the the MaxWeight algorithm minimizes the expected 
total queue length in steady-state. One way to pose the optimality question is to increase the 
number of queues in the system, or increase the arrival to a point close to the boundary of the 
capacity region (the heavy-traffic regime), or do both, and study whether the MaxWeight algo¬ 
rithm is queue-length-optimal in a scaling sense. A conjecture regarding the scaling behavior for 
any algorithm, both in heavy-traffic and under all traffic conditions, has been stated in [TO]. The 
authors first heard about the non-heavy-trafRc version of this conjecture from A. L. Stolyar in 2005. 
The conjecture seemed to be difficult to verify for the MaxWeight algorithm, and so a number of 
other algorithms have been developed to achieve either optimal or near-optimal scaling behavior; 
see [HI da [TO]. The results in this paper establish the validity of one version of the conjecture 
(pertaining to uniform traffic in the heavy-traffic regime) for the MaxWeight algorithm. 

Note on Notation: The set of real numbers, and the set of non-negative real numbersare denoted 
by M, and M+respectively. We work in the — dimensional Euclidean space M”’ . We represent 
vectors in this space in bold font, by x. We use two indices 1 < i < n and 1 < j < n for different 
components of x. We represent the component by Xij and thus, x = {xij)ij. For two vectors 

X and y in M"' , their inner product (x, y) and Euclidean norm ||x||are defined by 


n n 

(X,y) = ||x|| = V (x, x) 

i=l j=l 




1=1 j=i 
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For two vectors x and y in M"' , x < y means Xij < yij for every (i,j). We use 1 to denote the all 
ones vector. Let denote the vector defined by = 1 for all j and = 0 for all i' ^ i and 

^ tj 
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for all j. Thus, is a matrix with row being all ones and zeros every where else. Similarly, let 
denote the vector defined by = 1 for all i and = 0 for all j' ^ j and for all z, i.e., it is a 
matrix with column being all ones and zeros every where else. For a random process q(t) and 
a Lyapunov function F(.), we will sometimes use V{t) to denote F(q(t)). We use Var(.) to denote 
variance of a random variable. 

2 Preliminaries 

In this section, we will present the model of an input queued switch, MaxWeight scheduling algo¬ 
rithm, some observations on the geometry of the capacity region and other preliminaries. 

2.1 System Model and MaxWeight Algorithm 

An input queued switch is a model for cross-bar switches that are widely used. An n x n switch has 

n input ports and n output ports. We consider a discrete time system. In each time slot t, packets 

arrive at any of the input ports to be delivered to any of the output ports. When scheduled, each 

packet needs one time slot to be transmitted across. 

Each input port maintains n separate queues, one each for packets to be delivered to each of 

the n output ports. We denote the queue length of packets at input port i to be delivered at output 
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port j at time t by qij{t). Let q G M” denote the vector of all queue lengths. 

Let aij{t) denote the number of packet arrivals at input port i at time t to be delivered to 
output port j, and we let a G M" denote the vector {aij)ij. For every input-output pair (i,j), the 
arrival process aij{t) is a stochastic process that is i.i.d across time, with mean E[ajj(t)] = Xij and 
variance Var(ajj(t)) = afj for any time t. We assume that the arrival processes are independent 
across input-output pairs, (i.e, if {i,j) 7 ^ {i',j'), the processes aij{t) and aiiji{t) are independent) 
and are also independent of the queue lengths or schedules chosen in the switch. We further assume 
that for all aij{t) < Umax for some Omax > 1 and P{aij{t) = 0) > for some ea > 0. The 

arrival rate vector is denoted by A = {Xij)ij and the variance vector {o'‘fj)ij is denoted by (cr)^ or 
cr^. We will use cr to denote 

In each time slot, each input port can be matched to only one output port and similarly, each 
output port can be mapped to only one input port. These constraints can be captured in a graph. 
Let G denote a complete nxn bipartite graph with n? edges between the set of input ports and the 
set of output ports. The schedule in each time slot is a matching on this graph G. We let Sij = 1 
if the link between input port i and output port j is matched or scheduled and Sij = 0 otherwise 
and we denote s = {sij)ij. Then, the set of feasible schedules, S C M*® is defined as follows. 

Sij < 1 Vz, j G {1,2,... ,n}| . 

Let S* denote the set of maximal feasible schedules. Then, it is easy to see that 


5 = 


G{o,ir 


2 = 1 


< 1 , 


i=i 


5 * 


n n I 

s G {0,1}®®" : ^Sij = l,^Sij = iyi,j G {1,2,... ,n} I 

i=i j=i J 


Each element in this set corresponds to a perfect matching on the graph G. Each of these maximal 
feasible schedules is also a permutation tt on the set 1, 2,... , n with 7 r(z) = j if Sij = 1. 


3 


A scheduling policy or algorithm picks a schedule s(t) in every time slot based on the current 
queue length vector, q(t). In each time slot, the order of events is as follows. Queue lengths at the 
beginning of time slot t are q(t). A schedule s(t) is then picked for that time slot based on the 
queue lengths. Then, arrivals for that time a(t) happen. Finally the packets are served and there 
is unused service if there are no packets in a scheduled queue. The queue lengths are then updated 
to give the queue lengths for the next time slot. The queue lengths therefore evolve as follows. 

Qiji^ T 1) — T 

= qij{t) + aij{t) - Sij{t) + Uij{t) 
q(t + 1) = q(t) + a(t) - s(t) + u{t) 

where [x]"*" = max(0, x) is the projection onto positive real axis, Uij{t) is the unused service on link 
{i,j). Unused service is 1 only when link {i,j) is scheduled, but has zero queue length; and it is 0 
in all other cases. Thus, we have that when Uij{t) = 1, we have qij{t) = 0, aij{t) = 0, Sij{t) = 1 and 
qij{t + 1) = 0. Therefore, we have Uij{t)qij{t) = 0, Uij{t)aij{t) = 0 and Uij{t)qij{t + 1) = 0. Also 
note that since Uij{t) < Sij{t), we have that '^ij ^ {0; 1} ^ bU 

The queue lengths process q(t) is a Markov chain. The switch is said to be stable under a 
scheduling policy if the sum of all the queue lengths is finite, i.e.. 



If the queue lengths process q(t) is positive recurrent under a scheduling policy, then we have 
stability. The capacity region of the switch is the set of arrival rates A for which the switch is 
stable under some scheduling policy. A policy that stabilizes the switch under any arrival rate 
in the capacity region is said to be throughput optimal. The MaxWeight Algorithm is a popular 
scheduling algorithm for the switches. In every time slot t, each link {i,j) is given a weight equal 
to its queue length qij{t) and the schedule with the maximum weight among the feasible schedules 
S is chosen at that time slot. This algorithm is presented in Algorithm [TJ It is possible to show 
that the Markov chain q(t) is irreducible and aperiodic under the MaxWeight algorithm for an 
appropriately defined state space |14l Exercise 4.2]. It is well known [Il[2] that the capacity region 
C of the switch is convex hull of all feasible schedules. 


= {AeM^ : (A,e«) < < 1 V i, j e {1, 2,... , n}} . 

For any arrival rate vector p = maxjjl^^ Ajj, Ylj Xij} is called the load. It is also known that the 
queue lengths process is positive recurrent under the MaxWeight algorithm whenever the arrival 
rate is in the capacity region C (equivalently, load p < 1) and therefore is throughput optimal. 

Note that there is always a maximum weight schedule that is maximal. If the MaxWeight 
schedule chosen at time t, s is not maximal, there exists a maximal schedule s* E S* such that 
s < s* . For any link (z, j) such that Sij = 0 and s*j = 1, qijit) = 0. If not, s would not have 
been a maximum weight schedule. Therefore, we can pretend that the actual schedule chosen is 
s* and the links (i,j) that are in s and s* have an unused service of 1. Note that this does not 
change the scheduling algorithm, but it is just a convenience in the proof. Therefore, without loss 


C = Conv(5) = < A G ^ Xij < 1, ^ Xij < 1 V z, j G {1,2,... , n} 

I i=i j=i 
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Algorithm 1 MaxWeight Scheduling Algorithm for an input-queued switch 

Consider the complete bipartite graph between the input ports and output ports. Let the queue 
length qij{t) be the weight of the edge between input port i and output port j. A maximum weight 
matching in this graph is chosen as the schedule in every time slot, i.e., 

s{t) = argmax^ ,Qij{t)sij = argmax (q(t), s) (1) 

sEcS - sE5 


Ties are broken uniformly at random. 


of generality, we assume that the schedule chosen in each time slot is a maximal schedule, i.e., 

s{t) G S* for all time t. (2) 

Hence the MaxWeight schedule picks one of the n! possible permutations from the set S* in each 
time slot. 

For any arrival rate in the capacity region C, due to positive recurrence of q(t), we have that 
a steady state distribution exists under MaxWeight policy. Let q denote the steady state random 
vector. In this paper, we focus on the average queue length under the steady state distribution, 
i.e., lE[X]i j We consider a set of systems indexed by e with arrival rate = (1 — e)u, where 
i' is an arrival rate on the boundary of the capacity region C such that all the input and output 
ports are saturated and t'jj > 0 for all i,j. The load of each system is then (1 — e). We will study 
the switch when e i 0. This is called the heavy traffic limit. We first show a universal lower bound 
on the average queue length in heavy traffic limit, i.e., on lime^o j %]• We then show that 
under MaxWeight policy, the limiting average queue length is within a factor of less than 2 of the 
universal lower bound and thus MaxWeight has optimal average queue length scaling. We will show 
these bounds using Lyapunov drift conditions. We will use several different quadratic Lyapunov 
functions through out the paper. 

2.2 Geometry of the Capacity Region 

2 

The capacity region C is a coordinate convex polytope in M” . Here, we review some basic defini¬ 
tions. For any set P G M™', its dimension is defined by 

dim(P) = min{dim(A)|P C A, A is an affine space } 

So the capacity region C has dimension n?. A hyperplane H is said to be a supporting hyperplane 
of a polytope PifPniL/0, Pn / 0 and P n = 0 where R+ and are the open 
half-spaces determined by the hyperplane H. For any supporting hyperplane H of polytope P, 
P r\ H is called a face [15]. A face of a polytope is also a polytope with lower dimension. A face F 
of polytope P with dimension dim(P) = dim(P) — 1 is called a facet. Heavy traffic optimality of 
MaxWeight algorithm for generalized switches is shown in mm when a single input or output port 
is saturated or in other words when approaching an arrival rate vector on a facet of the capacity 
region. However, in this paper, we are interested in the case when all the ports are saturated. The 
arrival rate vector v in this case does not he on a facet and so, that result is not applicable here. 

When iz is the arrival rate vector on the boundary of the capacity region such that all the input 


5 





ports and all the output ports are saturated, it lies on the face F 




2=1 


J = 1 


= Ag 


: = 1, = 1 Vz,j G {1,2,... ,n}| . 


It is easy to see that F as defined here is indeed a face by observing that the hyperplane (A, 1) = n is 
a supporting hyperplane of the capacity region C and it contains any rate vector v where all the ports 
are saturated. The face J^has dimension (n —1)^ = n^ —(2n —1), and lies in the affine space formed 
by the intersection of the 2n constraints, {X]r=i ~ ^ jI) ^ *}• Of 

these 2n constraints, one is linearly dependent of the others and we have 2n — 1 linearly independent 
constraints. The face F is actually the convex combination of the maximal feasible schedules S*, 
i.e., F = Conv(5*). These results follow from the fact that the face F is the Birkhoff polytope 
Bn that contains all the n x n doubly stochastic matrices. It is known m page 20] that Bn lies 
in the (re — 1)^ dimensional affine space of the constraints and is a convex hull of the permutation 
matrices. 

A facet of a polytope has a unique supporting hyperplane defining the facet. It was shown in 
[7] that when the arrival rate vector approaches a rate vector in the relative interior of a facet, 
in the limit, the queue length vector concentrates along the direction of the normal vector of 
the unique supporting hyperplane. However, a lower dimensional face can be defined by one of 
several hyperplanes, and so there is no unique normal vector. A lower dimensional face is always 
an intersection of two or more facets. We are interested in the case when the arrival rate vector 
approaches the vector i/ that lies on the face F . The face F is the intersection of the 2re facets, 
{(e(*),A) = 1} n C for all i, and {(e^-^^A) = 1} n C for all j. We will show in section U] that in 
the heavy traffic limit, the queue length vector concentrates within the cone spanned by the 2re 
normal vectors, for all i} U for all j}. We will call this cone K, . Here, we will present 
some definitions and other results related to this cone. More formally, the cone JC can be defined 
as follows. 


IC= 


X = 


where Wi 


and Wj G 


for all i,j 


Note that this means that for any x £ JC there are Wi G M_|_ and wj G M+ for alH, j G {1, 2,... , re} 
such that Xij = wt + vJj- However, such a representation need not be unique. For example, suppose 
that Wi > 1 for all i , then setting w'^ = Wi — 1 for each i and Wj = wj + 1 for each j, we again have 
that tc' G M+, Wj G M_|_ and Xij = w[ + vjJ for all i,j. 

The cone JC lies in the 2re — 1 dimensional subspace spanned by the 2re — 1 independent vectors 
among the 2re vectors, for all i} U for all j}. Call this space V/c- For any two vectors 
x,y G X — y is orthogonal to the subspace Vic, be-. 


X - y T Vk;. 


(3) 


This is easy to see since (x,e*-®)) = {y,e^®)) = 1 for all i and = I for all j. If 

Vj- denotes the subspace obtained by translating the affine space spanned by , it follows that the 
spaces Vfc and Vj^ are orthogonal because translation means subtraction by a vector. Moreover, 
they span the entire space M®® since their dimensions sum to re^. We now present a lemma about 
the structure of any vector in the cone JC. 
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Lemma 1. For any vector x € JC, we have 


1 . ^ ^ 1 
Xij = — Xiji H 


n 


n 


j'=i 


/ j 

n n 


^ n n 


i'=i 


2 ' = 1 j' = l 


Remark 1. In other words, any element in the matrix x, is equal to the average of all the elements 
in its row plus the average of all the elements in its column minus the average of all the elements 
in the whole matrix. Suppose the queue lengths q G /C, then any queue length from an input port 
to an output port is equal to the average queue lengths at that input port plus the average queue 
lengths at that output port minus the average queue length of all the queues in the switch. 


Proof. Since Xij is of the form Wi + wj for any x £ 1C, we have 

^ n n n n 

- Z] =- Z Z Z 


j'=i 


i'=l 


i'=lj'=l j'=l 

= Wi+ Wj 


i'=l 


i'=l j'=l 


This proves the lemma. 


□ 


2.2.1 Projection onto the cone 1C 

2 

The cone 1C is closed and convex. For any x G M” , the closest point in the cone /C to x is called 
the projection of x on to the cone 1C and we will denote it by xy. More formally, 

X|| = argmin ||x — y|| 
yelC 

For a closed convex cone 1C , the projection xy is well dehned and is unique |17l Appendix E.9.2]. 
We will use x_l to denote x — xy. We will use xyjj to denote the component of xy. Similarly, 

Note that unlike projection on to a subspace, projection on to a cone is not linear, i.e., (x+y)y / 
Xy +yy. A simple counter example is the following. In M^, let x = (2,2) and y = (—1, —1)- Consider 

the positive quadrant as the cone of interest. Then, xy = (2,2), yy = (0,0) and (x + y)y = (1,1). 

2 

Since for any x G M” , xy £ 1C, from the dehnition of the cone 1C , we have that every component 
of Xy is non negative, i.e., x\\ij > 0. However, x_i_ could have negative components. 

The polar cone 1C° of cone K, is defined as 

1C° = |x G M” : (x, y) < 0 for all y G /c| . 

2 

The polar cone 1C° is negative of the dual cone 1C* of the cone fC . For any x G M” , x^ G 1C° and 
(xy,x_L) = 0 m Appendix E.9.2]. Therefore, pythagoras theorem is applicable, i.e., 

||x|p = ||xy Ip + ||xj_|p (4) 

and so, ||xy|| < ||x|| and ||x_l|| < ||x||. 

2 

Projection onto any closed convex set in M"’ (and so onto a closed convex cone) is nonexpansive 
m Appendix E.9.3]. Therefore, we have ||xy — yyj < jjx — yjj. Since x_i_ is a projection onto 1C°, 
we also have 

||x± - y±| < jjx - yjj. (5) 
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2.3 Moment bounds from Lyapunov drift conditions 

In this paper, we will use the Lyapunov drift based approach presented in [7] to obtain bounds of 
average queue length under Max Weight. A key ingredient in this approach is to obtain moment 
bounds from drift conditions. A lemma from [18] was used in [7] to obtain these bounds and we 
first state it here as it was stated in |7|. 

Lemma 2. For an irreducible and aperiodic Markov chain {A(t)}t>o over a countable state space 
A, suppose Z : A —M+ is a nonnegative-valued Lyapunov function. We define the drift of Z at 
X as 

AZ{X) ^ [Z{X{t + 1)) - Z{X{t))] X{X{t) = A), 

where I{.) is the indicator function. Thus, AZ(X) is a random variable that measures the amount 
of change in the value of Z in one step, starting from state X. This drift is assumed to satisfy the 
following conditions: 

Cl There exists an rj > 0, and a k < oo such that for any t = 1,2,... and for all X G A with 
Z{X) > K, 

E[AZ{X)\X{t) = X]< -rj. 


C2 There exists a D < oo such that for all X £ X, 

P(|AZ(A)| <D) = l. 


Then, there exists a 0* > 0 and a C* < oo such that 


lim sup E 
t —>-00 


J*Z{X{t)) 


< C*. 


If we further assume that the Markov chain {A(t)}t is positive recurrent, then Z{X(t)) converges 
in distribution to a random variable Z for which 


E 


e*z 


< C\ 


which directly implies that all moments of Z exist and are finite. 

This lemma (and its original form in [T8|) is quiet general and versatile. However, we use a 
different result in this paper to obtain moment bounds that are tighter than the bounds that can 
be obtained using Lemma [2] (or its original form in [IB]). The following lemma essentially follows 
from |19l Theorem 1] except for some minor differences. The proof is presented in ApoendixlAl and 
makes use of Lemma [21 

Lemma 3. Consider an irreducible and aperiodic Markov chain Markov Chain {A(t)}t>o over a 
countable state space A, suppose Z : A —)• M+ is a nonnegative-valued Lyapunov function. The 
drift AZ{X) of Z at X as defined in Lemma @ is assumed to satisfy the conditions 1(7.11 and I C. ^ 
Further assume that the Markov chain {X{t)}t converges in distribution to a random variable X. 
Then, for any m = 0,1, 2,..., 

F{Z {X) > K + 2Dm) < 

As a result, for any r = 1 , 2 ,..., 

E[Z{lCj'']<{2KY+ {4DY r\. 












3 Universal Lower Bound 


In this section, we will prove the following lower bound on the average queue lengths, which is valid 
under any scheduling policy. 

Proposition 1. Consider a set of switch systems with the arrival processes described in 

Section \2.1\. parameterized by 0 < e < 1, such that the mean arrival rate vector is = (1 — e)iy 
for some v & iF and variance is . The load is then p = (1 — e). Fix a scheduling policy 

under which the switch system is stable for any 0 < e < 1. Let denote the queue lengths 

process under this policy for each system. Suppose that this process converges in distribution to a 
steady state random vector . Then, for each of these systems, the average queue length is lower 
bounded by 


E 





n(l — e) 
2 


Therefore, in the heavy-traffic limit as e | 0, if —)• cr^, we have 


lim inf eE 
eto 




> 


2 


2 


Proof. We will obtain a lower bound on sum of all the queue lengths by lower bounding the queue 
lengths at each input port, i.e., we will first bound E[^j- q^ij] for a fixed input port i. We do this by 
considering a single queue that is coupled to the process Ylj 9if (^) • Consider a single server queue 
(t) in discrete time. Packets arrive into this queue to be served. Each packet needs exactly one 
time slot of service. The arrival process to this queue is a^^\t) = Mean arrival to this 

queue is E[a|^^(t)] = Yhj = (1 “ s) = (1 “ s) since v ^T.. As long as the queue is 

non empty, one packet is served in every time slot. Thus, this queue evolves as 


#(t + l) 


- 1 


where v^''\t) is the unused service and so v^'^\t)(j)^^\t ->r 1) = 0. Clearly, (f\^\t) is positive recurrent 
and let denote the steady state random variable to which it converges in distribution. 

Claim 1. In steady state 

j 


Proof. Suppose that at time zero, the queue starts with = Ylj Qif i^)- Then, for any 

time t, the queue 4>\^\t) is stochastically no greater than Ylj Qif i't)- This can easily be seen using 
induction. For t = 0, we have qij{0) > (j)i{0). Suppose that qlf {i) > Then, at time 
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{t + 1 ), 


^ q^\t + 1)=Y^ + alf{t) - s\f{t) 


> 


> 


E(''«’w+“.yw-»« w) 




Ml 


M 


;‘'(t) + a'‘>(t)-l) 


= <pf\t + 1) 


where the last inequality follows from the inductive hypothesis, definition of M\t), the constraint 

.h)/ 

> Since steady state distribution of does not depend on the 


< 1 and the fact that if a; > y, we have that [x]"'' > [y]"''. Thus, we have that in steady state, 


states (/>f^(0) and Y^^q^fiO) 


3 

initial state at time zero, we have the lower bound ^[YljQif] ^ independent of the initials 


□ 


■h)i 


We will now bound K[4>i ]■ This result is obtained in |14] . We present it here for completeness. 
Consider the drift of E[((^|^^(t))^]. 


e[(^,!-'(( + i))2 


(4'>(t))"]=E[(4')(t)+ 4\t) - 1+i,i-i(t))2 - (<A,(t)(-))2| 

fc* E|(#(i) + 4>(t) - if - - (0,(()'•>)"] 

= E[(a<'>(() - 1)" + 2(4'V))(a!‘’(*) - 1) - (n''>(())"| 
S E|(aS'>(() - (1 - e) - if\ - 2eE|0<'>(()l - E|«<‘>(«)1 

i Var (of’a)) + - 2£E|^iS‘'(t)l - E|u<'>(f)| 

- E (df)" + - 2eE|.^S‘V)l - E|«<<>(t)| 

3 


where (a) follows from noting that + a[^\t) — 1 + v^''\t)) = 0; (b) follows from 

independence of and the arrivals and since G {0,1}; (c) follows from the 

fact that E[(/>j^^(t)] = (1 — e); (d) follows from the definition of a[^\t) and independence of the 
arrival process aij{t) across ports. It can easily be shown that E[(i;A-’^^)^] is finite [HI Section 10.1]. 
Therefore, the steady state drift of E[((/)[^^(t))^] is zero, i.e., in steady-state, we get 

2eE[#] = (ug))' + - E[t(^)] (6) 

3 

Consider the drift of Ef^ggt)]. 

E[4\t + 1) - 4\t)] = E[a‘f\t) - 1 + 

= -e + E[u(")(t)] 
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Since G Z+, we have , and so we get finiteness of from that of 

E[((^^^^)^]. Therefore, the drift of is zero in steady state. Thus, we get that in steady 

state, E[l;^'^^] = e. Substituting this in ([6]), and using the claim, we get 

E[E4‘’i>Ei?h = lE(4’)'-^ m 

j j 

Since this lower bound is true for any input port i, summing over all the input ports, we have the 
proposition. Note that we could have obtained the same bound by similarly lower bounding the 
sum of lengths of all the queues destined to port j, i.e., □ 

We do not know if this lower bound is tight, i.e., if there is a scheduling policy that attains 
this lower bound. However, in section [Sj we show that under Max Weight scheduling algorithm, the 
average queue lengths are within a factor of less than 2 away from this universal lower bound, thus 
showing that MaxWeight has optimal scaling. Closing this gap is an open question. 

4 State Space Collapse under MaxWeight policy 

In this section, we will show that under the MaxWeight scheduling algorithm, in the heavy traffic 
limit, the steady state queue length vector concentrates within the cone JC in the following sense. 
As the parameter e approaches zero, the mean arrival rate approaches the boundary of the capacity 
region and we know from the lower bound that the average queue lengths go to infinity 0(l/e). 
We will show that under the MaxWeight algorithm, the component of the queue length vector 
is upper bounded independent of e. Thus the component is negligible compared to the qii*^^ 

component of q^^\ This is called state space collapse. We say that the state space collapses to the 
cone JC. It was shown in [20] that the state space collapses to the subspace containing the cone 1C. 
A similar result was also shown in [2T] for a different problem. Here, we show the stronger result 
that the state space collapses to the cone, which is essential to obtain the upper bounds in Section 

13 

We dehne the following Lyapunov functions and their corresponding drifts. 

y{<i) = Hf = ^±(q)-llq±ll ‘^±(q) = llq±f = '^i(q) - llqyf = 

ij ij ij 

AH(q) A [H(q(t + 1)) - H(q(t))] X(q(t) = q) 

Af^±(q) = [WMt + I)) - iyx(q(f))] X(q(t) = q) 

AI4(q) A [I4(q(t + 1)) - I4(q(f))] X(q(t) = q) 

AI^I (q) A [I^i (q(t + 1)) - (q(t))] X(q(t) = q) 

We will use Lemma [3] using the Lyapunov function ILj_(q)(.) to bound the q||^^ component in 
steady state. We need the following lemma, which follows from concavity of square root function 
and the Pythagorean theorem Q. The proof of this lemma is similar to the proof of Lemma 7 in 
[7] and so we skip it here. 

Lemma 4. Drift ofW±{.) can be bounded in terms of drift ofV{.) and V||(.) as follows. 

AITx (q) < (AH(q) - AI^| (q)) Vq G M”' 

^ 11 qj-11 


11 




We will now formally state the state space collapse result. 

Proposition 2. Consider a set of switch systems under MaxWeight scheduling algorithm, with the 
arrival processes described in Section [Kl[ parameterized by 0 < e < 1, such that the mean 

arrival rate vector is = (1 — e)^' for some u ^ F such that r'min — rninjj > 0. The load is 
then p = {1 — e). Let the variance of the arrival process be Let q_^^\t) denote the queue 

lengths process of each system, which is positive recurrent. Therefore, the process it) converges 
to a steady state random vector in distribution, which we denote by . Then, for each system 
with 0 < e < z/ min /2||^'||, the steady state queue lengths vector satisfies 



where 


Mi") = 2^ 


max 


'8(||aI 


")||2 


+ 


W ||2 


-, (V^e) ' 16-(nOmax + 1) 


e 


Remark 2. Note that for any r, the expressions Mr are upper bounded by a constant not dependant 
on e whenever there exists a a which does not depend on e such that < a for all e. This 

is why we call this state space collapse. Our notion of state-space collapse considers the system 
in steady-state, and is hence mathematically different from the state-space collapse result in [5], 
although the results are similar in spirit. 

Proof. We will skip the superscript in this proof for ease of notation. Thus, we will use q(t) , A 
and cr to denote q^")(t), A^") and respectively. We will verify both the conditions 1C.II and 1C.21 
to apply Lemma[3]for the Markov chain q(t) and Lyapunov function lTj_(q(-))- First we consider 
condition 1C.21 


|AWj_(q)| =|||q_L(t-M)|| - ||q_L(t)|||X(q(t) = q) 

< ||q_L(t +1) - q±(t)|| 

<||q(t + 1) -q(t)|| 

y b 

y b 

<namax (8) 

where (a) follows from triangle inequality, i.e., |||x|| — ||y||| < ||x —y|| andX(.) < 1; (b) follows from 
nonexpansivity of projection operator ([5D; (c) is true because each queue lengths can increase by 
at most Umax > 1 due to arrivals and can decrease by at most 1 due to departures. Thus condition 
1C.21 of Lemma is true with D = nomux- 
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We will now verify [CTTl using Lemma 0] by bounding the drifts Ay(q) and AVidq). 


E[Ay(q)| q(t) = q] 

=E [||q(t + l)f - ||q(t)f I q(t) = q] 

=E [||q(t) +a(t) - s(t) +u(t)f - ||q(t)f | q(t) = q] 

=E [||q(t) +a(t) - s(t)f + ||u(t)f +2(q(t + 1) - u(t),u(t)) - ||q(t)f | q(t) = q] 
<E [||a(t) - s(t)f + 2(q(t),a(t) -s(t))|q(t) = q] 


^=^E 






-1 

— + ^ij) + n - 2E 

ij 

ij 

q(t) = q 


= ll'^ll^ + ll'^lP + ^“2(l — e)E 




q(t)=q +2(q,A-E[s(t)|q(t) =q]) 

+ 2(q,A-E[s(t)|q(t) =q]) (9) 

q(t) = q + 2 (q, (1 - e)i^ - E [s(t)| q(t) = q]) 




<||A||^ + ||cr|p + n - 2e (q, i^) + 2 (q, - E [s(t)| q(t) = q]) 

= 11 A||^ + ||cr|p + n — 2e (q, +2 min (q, u — r) 


rec 


( 10 ) 


where (a) follows from the fact that (q(t + l),u(t)) = 0 and dropping the —||u(t)|p term; (b) is 
true because Sij € {0,1}. Note that E,[ajj{t)] = E[ajj(t)]^ + Var(aij(t)). Also note that the arrivals 
in each time slot are independent of the queue lengths and hence are also independent of the service 
process. These facts and ([2]) give (c). Since we use Max Weight scheduling algorithm, from ([T]), we 
have (1101) . In order to bound the last term in (1101) . we present the following claim. 

2 

Claim 2. For any q G M” , 

, Z^min _ ^ 

V + -7- -q± G C. 

m±\\ 

Proof. Since \q±ij\ < ||q±||, i^ij + q±ij > I'ij — ^'min > 0 and so zv + q_L G We know 
that q_L G IC° and G /C, and so (q±,e(*)) < 0. Thus, for any i, we have 


|q±i 


+ ^q±, )=(u, e«\ + ^ (q^, e« 


|q±i 


<{u,e 


=1 


(0 


where the last equality is due to the fact that u £ F. Similarly, we can show that [v + q±, ~ ^ 

for any j, proving the claim. □ 
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Using the claim in (fTO|) . we get 


E [AU(q)| q(t) = q] <|| A||^ + ||(t|P + n - 2e (q, i/) + 2 ( q, i/ - (v + 


= ||A|p + ll'^lP + n — 2e (q, 1 /) — 


2z^tt 


|q±l 


|q±i 
(qy +q±,q±) 


rq-L 


= ||A|p + ||cr||2 +n- 2e(q,i/) - 2i^rnm\\Q i | 


( 11 ) 


where the last equality follows from the fact that (q||,q_L) = 0. We will now bound the drift 
AU|(q). 


E [AV||(q)| q(t) = q] 

=E [||q||(t + l)f - ||q||(t)f |q(t) = q] 

=E [(q||(t + 1) + ||q||(t),q||(t + 1) - ||q||(t))| q(t) = q] 

=E [||q||(t + 1) - q||(t)f I q(t) = q] + 2E [ (qy (t), qy (t + 1) -qy(t)>| q(t) = q] 

>2E [ (qy {t), qy (t + 1) - qy {t))\ q(t) = q] 

=2E [(qy(t),q(t + 1) - q(t))| q(t) = q] - 2E [(qy(t),q±(t + 1) - q±(t))| q(t) = q] 

>2E [(qy(t),a(t) - s{t) + u{t))\ q(t) = q] 

(b) 

>2(qy,A> - 2E [(qy, s(t)) | q(t) = q] 

= - 2e (qy ,v) -2E[ (qy, s{t) - v) \ q(t) = q] 

= -26(qy,iv) (12) 

Equation (a) is true because (qy(t), q±(t)) = 0 and (qy (t), q_L(t + 1)) < 0 since qy(t) G /C and 
q_L(t + 1) G /C°. All the components of qy and u(t) are nonnegative. Using this fact with indepen¬ 
dence of the arrivals and the queue lengths gives Equation (b). The last equality follows from ([5|) 
since qy G /C G Vjc and s(t), v ^ T from ([2]). Now substituting (fTTI) and (fT^ in Lemma 01 we get 


E[AWj_(q)|q(t) = q] < 


1 


2||q± 

||A|p -I- ||<t|P -|- n 

l|q±ll 


A||^ + ||cr|p + n - 2e(q, u) - 2i/min||q±|| + 2e(qy,i/)) 


^min 


|q-Ll 




(“) II AlP -|- ||<t|P + n 
<-n-n-^'min + e 


< 


< - 


iiq±ii 

|A|P -|- ||<t|P + n V, 


mm 1 ^ ^min 

whenever e < 


|q±l 


for all q such that W± (q) > 


2\\u\\ 

4(||A||2 + ||cr||2 + n) 


where (a) is due to the Cauchy Schwartz inequality — j jq^j j ||t^||- Thus condition 1C.II is 

valid with k = and ri = Then from Lemma [3l we get for r = 1, 2,..., 


E 


114'’I 


< 


|A|P + 


+ n) 


W I ( 

+ r! 16- na^ 

J V ^min / V 


+ 


14 























(“) |^8(||A||2 + ||cr||2+n) 


+ ^re 16-(namax + 1) 


<2 max 


| A ||2 + ||^||2 + n ) 




6 ^min 

T nCl-[aa.x 


e V„ 


{nama.x + 1 ) 


where (a) follows from Stirling’s upper bound of the factorial function, r! < and noting 

that t'min < 1 follows from the definition of I'rnin and the capacity region C. The last inequality 
follows from o'" + 6^ < 2max(o, by, proving the proposition. □ 

Recall that there are n! maximal schedules (permutations or perfect matchings). For each of 
them, MaxWeight assigns a weight which is the sum of corresponding queue lengths and then 
picks the one with the maximum weight. In this process, it is equalizing the weights of all the 
schedules by serving the matching with maximum weight and thereby decreasing it. The cone JC 
has the property that if the queue lengths vector q is in the cone JC , we have Wi and Wj such that 
Qij = Wi + Wj. This means that all the maximal schedules have the same weight ^^ Wi + Wj and 
the MaxWeight algorithm is agnostic between them. Thus, the state space collapse result states 
that in steady state, MaxWeight is (almost) successful in being able to equalize the weights of all 
maximal schedules in the heavy traffic limit. This behavior is very similar to Join-the-shortest 
queue (JSQ) routing policy in a supermarket checkout system. In such a system, there are a few 
servers, each with a queue. When a customer arrives to be served, under JSQ policy, (s)he picks 
the server with the shortest queue. It was shown in [7] that in the heavy traffic limit, the state of 
this system collapses to a state where all the queues are equal, and thus, JSQ is agnostic between 
all the queues when such a state space collapse occurs. Here JSQ policy is trying to equalize all the 
queues by increasing the shortest one, and it is (almost) successful in doing that in steady state in 
heavy traffic limit. 

A natural question in this context is if there is any interpretation to the variables Wi and Wj. 
These variables are the optimal dual variables for the maximum weight matching problem. The 
maximum weighted perfect matching problem in bipartite graphs (that MaxWeight solves in every 
time slot) can be written as the integer program (lljp and its linear program (LP) relaxation is the 
linear program (|14ll . 


max 


b 

subject to: E Sij = IVj 

i 


max 


ij 

subject to: E Sij = IVj 

i 


Sij £{0,l}yi,j. (13) Sij^O'CiJ. (14) 

It can be proved that the optimal solution of the LP relaxation (jl4h is identical to the optimal 
solution of the original integer program (I13|) |22j . The dual of the LP (I14p is the following. 


mm 


in^u;i + ^ 


Wi 


subject to: Wi + Wj > qijyi,j 


(15) 


For any perfect matching vr and its corresponding schedule Sij, and for any dual feasible Wi, Wj, we 
have that Ylii^in{i) = ^ijQijSij < Si Suppose sy is an optimal solution of [13] and 
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corresponds to a permutation vr*, and suppose w*,'Wj is an optimal solution of [151 Then, from 
strong duality, we have that 


i ij i j 


Moreover, any vr* and w*,w* that satisfy (fT6]l are optimal solutions for problems (fT^ l or (fTT)l ) 
and (USD respectively. This means that any optimal perfect matching consists of only links (i,j) 
such that Qij = Wi + Wj. This property is also called complementary slackness. The Hungarian 
assignment algorithm for solving the MaxWeight matching problem is based on this property. The 
cone /C, has the special property that if Wi,Wj is the optimal solution, then for any (i,j), we have 
Qij = Wi + Wj and so any perfect matching is an optimal matching and all perfect matchings have 
the same weight. 

The fact that all perfect matchings have same weight when q E /C can be used to give an 
alternate proof of Lemma [H The average weight of all the n! perfect matchings is ^ j/ 

Now consider the matchings that contain the edge ij. There are (n — 1)! such matchings. The 
total weight of all these matchings is (n — !)!%• + ~ because every edge i'j' 

appears in (n — 2)! of these (n — 1)! matchings. Since all the matchings have same weight, equating 
the average weight of these (re — 1)! matchings to the average of all the matchings, we have 


Qij + 


re — 1 




i Vi 


j+^ ^ I yy yy yy y^ | ~ ^ yy 


Qij 1 + 


i'=i 
1 


i'=l 


n-ij n-f y^y^I ~y^ 

\j'=i i'=i ) i'd' 

/ n n \ 

-1 y] %■'+yz I = - yz 

u'=i 


1 1 


re re — 1 
1 


i'=l 


* J 


re 


when re > 1, 


which gives Lemma [TJ 


5 Asymptotically tight Upper and Lower bounds under MaxWeight 
policy 

In the previous section, we have shown that the queue length vector collapses within the cone JC in 
the steady state. We will use this result to obtain lower and upper bounds on the average queue 
lengths under MaxWeight algorithm. The lower and upper bounds differ only in o(l/e) and so 
match in the heavy traffic limit. 

We will obtain these bounds by equating the drift of certain carefully chosen functions equal to 
zero in steady-state. We first define a few Lyapunov-type functions and their drifts, in addition to 
the already defined H(q) = ||q|p. The following lemma states that all these Lyapunov functions 
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have finite expectations in steady state. 


(q) - ^ 2 (q) = %•) ^3(q) = f X] 

i \ j J j \ i / \ ij 

^^i(q) -[^i(q(^ +1)) - (q(i))] 2:(q(t) = q) 

^^ 2 (q) =[^ 2 (q(i +1)) - v^ 2 (q(i))] 2:(q(t) = q) 

AF 3 (q) =[V' 3 (q(t + 1)) - V^3(q(i))] 2:(q(t) = q) 

Lemma 5. Consider the switch under MaxWeight scheduling algorithm. For any arrival rate 
vector A in the interior of the capacity region A G int{C), the steady state means E[i/(q)], E[Vi(q)], 
E[V 2 (q)] and E[V 3 (q)] are finite. 

The lemma is proved in Appendix[Bl We will now state and prove the main result of this paper. 

Theorem 1. Consider a set of switch systems under MaxWeight scheduling algorithm, with the 
arrival processes a^''\t) described in Section \2.1\. parameterized by 0 < e < 1, such that the mean 
arrival rate vector is X" = (1 — e)u for some v ^ T such that Vrmn — minjj vij > 0. The load is then 
p = {1 — e). Let the variance of the arrival process be The queue length process for 

each system converges in distribution to the steady state random vector q^*^^ . For each system with 
0 < e < 1 / 111 ^/ 2 IIivII, the steady state average queue length satisfies 


1 - 


2n 


rW| 


— Bi{€, n) < E 


E 






< 1 - 


2n 


r(^)| 


+ B2{e,n) 


where 


Bi{e,n) =—^ + n + 3n^‘^ >')e( r)M!f^ and B 2 {e,n) = + 2n(^ ’')e( -r) 


for any r ^ {2,3,...}. The terms Bi{e,n) and B 2 {e,n) are both o , i.e., eBi{e,n) = 0 and 

lim^j^o £-S 2 (e, n) = 0. Therefore, in the heavy traffic limit as e | 0 which means as the mean arrival 
rate X —)• ^1, if —)• cr^, we have 


lim eE 
eto 





2 


Proof. Fix an 0 < e < r'min/2||i/|| and we consider the system with index e. For simplicity of 
notation, we again skip the superscript (e) in this proof and use q to denote the steady state 
queue length vector. We will use a to denote the arrival vector in steady state, which is identically 
distributed to the random vector a{t) for any time t. We will use s(q) and u(q) to denote the 
schedule and unused service to show their dependence on the queue lengths. We will use q"'' to 
denote q + a — s(q) + u(q), which is the queue lengths vector at time t + 1 if it was q at time t. 
Clearly, q"*" and q have the same distribution. 
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Define a new function V 4 (q) and its drift as follows. 


V4(q) =14 (q) +14 (q) - ^I4(q) 

i \ j j j \ i / \ ij j 

AV4(q) =[V4(q(t + 1)) - V4{q{t))] X(q(t) = q) 

=AVi(q) + Al/2(q) - -AV3(q) 

n 

Since —< 14(q) < 14(q) + 14(q)) the steady state mean E[V 4 (q)] is finite from LemmaO 
Therefore, the mean drift of V 4 (.) in steady state is zero, i.e., 

E[Ay4(q)] = E[[^4(q(t + 1))-F4(q4))] ^(q(t) = q)] = E[y4(q+)]-E[D4(q)] = E[y4(q)]-E[y4(q)] = 0 


0 =E[Ay4(q)] 

=E[A14(q)] +E[Al/ 2 (q)] - -EfAVaCq)] 

n 


Expanding the drift of 14(.), we get 
E[AEi(q)] 

=E[Ei(q + a - s(q) + u(q)) - Ti(q)] 


=E 


=E 


2n 


E E(^ii + “EE 




+ E 


=E 


+ 2E 


E j + ^E j |^E^*T(q) 

i \ 3 ) i \ j / \ r y 

E j “E + ^E fE«511 


EE 


Qii 


(17) 


Similarly expanding drifts of V 2 (.) and VsC-) and substituting in (jl7jl . we get the following expression. 
Since this is a lengthy equation, we split into various terms which we denote by 7i ,72 ,73 and T 4 ■ 
For simplicity of notation, we suppress all the dependencies in terms of q , a , s (q ) and u (q ). 


71 —72 + 73 + 74 


(18) 
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where 


Ti =2E 


* \ j 


- -E 

n 




CLijf) 


+ 2E 




j \ * 




T 2 =E 


T3 = -E 


Y - Sh(q)) 


* \ J 


-E 


+ “ «h(q)) 

j \ i 

Y fc''h(q)^ 


- -E 

n 


2n 


Y^^ij - «h(q)) 


Ta =2E 


E E'^Y(q) 


] \ % 


+ 2E 


+ -E 

n 


2 -\ 


E“h(q) 




E E^h E'^Y(q) 


- -E 
n 


E4 E“Y'(q) 




We will now bound each of the four terms. The schedule in each time slot is maximal (j2|) and so 
Sij = 1, Sij = 1 and Sij = n. Noting that the arrivals are independent of queue lengths, 
we can simplify the term 71 as follows. 


Ti =2E 


E E^h i-E^*. 


* \ 3 


+ 2E 


E E^h 1 -E^y 


] \ % 


- -E 
n 


E^h) l^-E^’'’-' 

ij j \ i'j 


^=^2E 


=2eE 


EME 


E^*. 


Qi 


+ 2E 


EdE?., 


- -E 

n 


E^ 


7 . % 


where (a) follows from the fact that Ylj ^ij 

v^F. 

Thus, from (fT8|) . we have 


= 1 — e and Yli ^ij = 1 — e since = (1 — e)!^ and 


2eE 


E 

ij 


= 72 +73 + 74 . 


(19) 


Now the rest of the proof involves bounding the term 72,71 and 71. 


We start with the term Ti- 
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Consider the first term of Ti- Again noting that the schedules are maximal ([2]), we get 


E 


2n 




i 

:^E 

i 

:^E 


2n 


(“) 2 
= ne 


{b) 2 

= ne 


flij 1 

j 

ttij - (1 - e) - e 

- (1 -e) 

j 

+ J^Var 
* \ i 

+ Z] 


2n 








2 I II ||2 

=ne + ||cr|| , 

where (a) is true because = (1 — e); (b) follows from the independence of the arrival 

processes across ports. Similarly, we can show that the second term in T 2 evaluates to 

E y^,- ( — Siifqf)) = ne^ + IlcrlP. The last term can likewise be evaluated as follows. 


-E 

n 





V 

- Sb(q)) 

= -E 
n 

Y 

. \ / . 

A / . 


=-E 

n 


'Y, O'ij ~ ^(1 — e) — ne 




= -E 
n 


=ne 


=ne 


yy aij - n(l - e) 
ij 

+-y^4- 


\'1 


Y 


+ ne^ — 2eE 

“^(1 -e) 

/ J 


/J 


1 


^J 


2 I -^11 ||2 

=ne H—Ill'll , 
n 


Putting all the terms of T 2 together, we get 


1 


T 2 =ne + 2-||cr 


Since YlijQij ^ l^ave Ylij Qij < Using the fact that E {Y^ijqiY 


( 20 ) 


is hnite 


from Lemma [5l we have that E ^ -. q 


i] 'iij 


is hnite and so its drift is zero in steady state. Thus, we 
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get 


E 




0=E Y^qij{t+1)-Y^qijit) X(q(t)=q) 
ij ij 

=E aij - Y Sij (q) + (q' 

ij ij 

=n — n(l — e) 




=ne 


( 21 ) 

We will now bound the term Ts . Since Uij{t) < Sij{t), we have Yhi'^ij — ^ 

Ylij '^ij ^ Therefore, 


-E 


\ J 


-E 


-E 






< Ts <-E 
n 


< Ts <-E 
n 


2 -\ 


n 


j 

^«b(q) 


—2ne < Ts <ne 


( 22 ) 


We now consider the term 71 . It can be rewritten as follows, and can be split into two parts, 
one each corresponding to q|f and q({(, where qj)" means (q''^)|| and similarly q({(. 


71 =2E 


=2E 


E “7 (q) (E + E ^ E ^P' 




E E 4^r + E 4^'l " n E 4'i' 

ij \ j' i' i'j' 


+ 2E 

W 

1_ 

E4ii' + E4i7- 

„E4i'i') 


. b' 

V j' i' 

/ . 


Since the vector qjl" is in cone K, by definition, Lemma[T]is applicable. Recall that when Uij{t) = 1, 
qij(t + 1) =0. Thus, when Uij(q) = 1, we have 

=0 

4ij=-^Pj 


-t 'IT' 1 ^ 1 n n 

- E 4if + n E E E 47' = - ^Pj 


j'=l 


i'=l 


i'=l j'=l 


Therefore, we get 
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and the term Ta reduces to 


Ta =2E 


=2E 


(‘l) + Y. " n ^ 




u(q), -nq+ + ^ <|q+, ^ (|q+, - - (q+, l) 1 


(23) 


Term 74 is a critical term to bound and our choice of the Lyapunov function V 4 (.) is motivated 
primarily to obtain (1231) . We explain the motivation in detail at the end of this section. From 
state space collapse, we know that q({( is bounded. We will now use this result to show that Ta is 
o(e). Since q({( € K,° and 1 e /C for all i, j, we have that (q(}(,e(®)) < 0 , < 0 and 

(q({(, l) < 0 . Moreover all components of u, and 1 take values 0 and 1. Therefore, 


Ta <2E 


(a) 

<2 E 


u(q),-nq+--(q+,l>l 


|ulqj 


<2 (ne)’’ ( E 

(c) 1 / 

<2 (ne) ( E 


= 2 (ne)’ 


E 


-nq+--(q+ 1)1 


qtir + -|(ql>i>llli 


’^llqlllr + -llql 
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^llqlllr + 


(n^) 



^=Wi+-9eF (E[||q+||;])^ 
<4n(i+^)e^ (E[||q+|r2])" 
<4n(^+^)e?MW 


n 


for r > 2 

for r > 2 
for r > 2 


where ||x||r denotes the Ir norm of a vector x , and r, r € (l,oo) satisfy 1/r + 1/r = 1. Inequality 
(a) follows from the Holder’s inequality for random vectors. Cauchy-Schwartz inequality (which is 
a special case of Holder’s inequality) may also be used to obtain the same bound in heavy traffic 


limit. However, in the non-heavy traffic limit. Holder’s inequality gives a tighter 


G {0,1}, from we have E [||u(q)||~] = E 


[Eii(«d(q))' 


= E 


Eij^L'(q) 


30und. Since 
This 


= ne. 


fact along with using triangle inequality on the second term gives (b). Inequality (c) again follows 
using Holder’s inequality for vectors. The Ir norm of vector 1 is ||l||r = n^/^, this gives (d). Since 
i -|- i = 1, we have (e). For any vector x, if 0 < r < r', we have ||x||r/ < ||x||r, and this gives 
(f) and (g) follows from state space collapse in Proposition [2j The last inequality follows from 
l/r-|-l/r = l. Similarly, we can lower bound Ta as follows. 


Ta >2E 


u(q), -nq;| ^ (|q+, ^ 


q(}, \ 
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Let’s now focus on the middle term in 


the expectation above. From the dehnition of we have 









1 

r 




I 

r 


(a) 

< 



=^l|qlllr 


For any [xi,... ,Xn) G K” and r > 1, from Jensen’s inequality, we have I ® 1 < ^ " . This 

gives inequality (a) above. We have a similar bound for the last term in expectation in (j24p . Using 
both these bounds, the lower bound on 71 becomes, 

r4>-6n(i+-^)6^E[(||q+yn)' 

> — (e [(||qj||| 2 )^])^ for r>2 

> — for r >2 

> — M!f^ for r > 2 


Combining the lower and upper bounds on 71, for r > 2, we have 

< 71 < (25) 

Using (l20l) . (l22]l and (125|) to bound (fT9|) and reintroducing the superscript we get the theorem. □ 

We will now present the motivation for the choice of the function Vl(.). First consider a discrete¬ 
time single server (G/G/1) queue, q{t) that evolves according to q{t + 1) = q{t) +a{t) — s{t) + u{t). 
The queue (j){t) in Section [3] is an example. Similar to ([7]), we can obtain tight lower and upper 
bounds on mean queue length in steady state by setting the drift of E[g^] to be zero in steady state, 
i.e, E[g^(t -|- 1)] = E[g^(t)]. Such a bound is called Kingman bound. See [HI Section 10.1]. When 
expanded, this equation again gives four terms, similar to the terms 71,72,73 and 71. The fourth 
term 71 then is u{q)'^, which is zero from the definition of unused service. This is an important 
step in obtaining tight bounds. 

Next, consider a load balancing system, similar to a super market checkout lanes. There are n 
servers with a separate queue for each server. Whenever a user arrives into the system, (s)he picks 
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one of the servers and joins the corresponding queue. We consider ‘Join the shortest queue’(JSQ) 
policy, in which each user joins the queue with the shortest length. Ties are broken uniformly at 
random. The queue length at server i then evolves according to qi{t + l) = qi{t) + ai{t) —Si{t)+Ui{t). 
It was shown in [7] that the JSQ policy has minimum steady state sum queue lengths in heavy 
traffic. This was done by first showing that the queue lengths collapse to a single dimension where 
they are all equal. A tight upper bound is then obtained by setting the drift of the quadratic 
function to be zero in steady state. When this equation is expanded, we again have 

four terms and the fourth one being of the form This is not zero in general 

because of the cross terms. However, when the state is such that all the queue lengths are equal, 
this term is zero. This is easy to see by considering the term When m = 1, we have 

that qf = ^ and when all the queues are equal, for any i',q'^ = 0. 

Therefore, in all these systems, when using a quadratic Lyapunov function, the fourth term 71 is 
the most important and challenging one to bound correctly. Usually, it should be zero if state space 
collapse is such that q|{| = 0. However, for the switch system, if we use Lyapunov functions Vi(.) or 
V 2 (-) or or Li(.) + V^.), we do not have the property that 71 = 0 when = 0 . Armed with 
Lemma m we add the additional — VK.) to Ui(.) + V 2 (-) to obtain the Lyapunov function Vl(.).We 
have shown in ([231) that 71 is zero when q G /C (since q({( = 0). The key idea in our upper bound 
proof is the choice of the function Vl(.). Essentially, we picked the function Vl(.) so that it matches 
with the geometry of the cone JC in the sense that if the queue length vector is in the cone JC, the 
fourth term 71 is zero. 


6 Uniformly loaded switch under Bernoulli traffic 

In this section, we consider the switch system when all the ports have Bernoulli traffic with same 
arrival rate. The lower and upper bound expressions then have much simple form. More precisely, 
for the system with index e , for every input-output pair (i,j), the arrival process is a 

Bernoulli process with rate Xij = (1 — e)/n. In other words, the rate vector approaches the vector 
iz = l/nG7^on the face 7^ as e —>■ 0. Then, clearly the variance vector for the system with index 
e is = ^^(1 — with = (1 — e){n — (1 — e)) and it converges to cr^ = 

Moreover, Umax = 1 and i^min = Using these values, we can restate Propositions [U and [21 and 
Theorem [1] as follows: 

Theorem 2. Consider a set of switch systems with the Bernoulli arrival processes a^'"\t) parame¬ 
terized by 0 < e < 1, such that the mean arrival rate vector is = ^^1. Fix a scheduling policy 
under which the switch system is stable for any 0 < e < 1. Let denote the queue lengths 

process under this policy for each system. Suppose that this process converges in distribution to a 
steady state random vector q^*^). Then, for each of these systems, the average queue length is lower 
bounded by 


E 




> 


2e 


(n 


1 ) 


Therefore, in the heavy-traffic limit as e | 0, we have 


lim inf eE 

e4.0 




> 


re — 1 
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Now consider the same switch systems operating under the MaxWeight scheduling algorithm. 
The queue length process (t) of each system is positive recurrent and so converges to a steady 
state random vector in distribution . Then, for each system with 0 < e < 1 /2n, the steady state 
queue lengths vector collapses into the cone K, in the sense that it satisfies 

< {MrY 'ir £ {1,2,...}, where = {2^/reY^'’16-n'^ {n + 1). 

e 

Therefore, the steady state average queue length satisfies 



where 



B\{e, n) < E 





+ B2{e,n) 


Bi{e,n) 


B2{e,n) 



n — 2 + 


n 


+ n-- + 



n — 2 + 


n 


+ 


n + I 

2 


r)g( r) Mr and 

+ 2n(^ r)g( r) Mr 


for any r S {2, 3,...}. The terms Bi{e, n) and i? 2 (e, n) are both o (^). In the heavy traffic limit as 
e 4, 0 which means as the mean arrival rate N ^ ^1, we have 



Thus, MaxWeight algorithm has optimal queue length scaling in the heavy traffic limit. 

Thus, in the heavy traffic limit, we have a universal lower bound on the (e scaled) average queue 
lengths that is I}{n) and the MaxWeight policy achieves this bound within a factor less than 2. 
Since we are interested in the asymptotics both in term of number of ports, n and distance from 
boundary of the capacity region, e, there are several possible limits in which the system can be 
studied. Heavy traffic limit is one such asymptotic, where we first let the arrival rate approach 
the boundary of the capacity region and look at the scaling of average queue length in terms of n. 
Another set of asymptotic regimes is when e —)• 0 and n —)• oo simultaneously. This can be studied 
by setting e = n~^ for /3 > 0. Such a limit was studied in [niiis] for scheduling algorithms that 
are different from the MaxWeight algorithms studied here. The universal lower bound in such a 
limit is It is now easy to see the following corollary. 

Corollary 1. Consider a sequence switch systems with Bernoulli arrivals, indexed by n. The 
system has mean arrival rate vector —1 with /3 > 0 and 7 „ > 0 is a sequence that is 

0(1). The load is = 1 — 7 „n“^. Fix a scheduling policy under which the switch system is stable 
for any n > 0. Suppose that the queue lengths process q{^\t) process converges in distribution to a 
steady state random vector q^”). Then, for each of these systems, the average queue length is lower 
bounded by 


E 




> 


(1 - 7nn 

‘2-in 


n^{n 


1) 
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and so is 

Under the MaxWeight scheduling policy, the queue lengths process q^^\t) process is positive 
recurrent and so converges to a steady state random vector in distribution When 27 ^ < 

the steady state average queue length satisfies 


fiU+h) 

In 


Bfin) < E 





jiU+P) 

< - 

In 


i? 4 (re) for /3 > 4 


(26) 


where Bfin) and Bii{n) are Thus, under the MaxWeight algorithm, the average sum 

queue lengths is and so has optimal scaling. 

Proof. The universal lower bound directly follows from Theorem [2] using = 7 „n“^. We will now 

prove the second part of the corollary which is under the MaxWeight policy. Sine 27 ^ < we 

have 0 < < l/2n and Theorem [2] is applicable. Therefore, we have (|26p with 


Bsin) 

Bfin) 


3n^ — ^ 

27n 
—3n^ + 

27n 


+ 1 - 


7nn 


-0 


1 \ 1 

n — 2H— +n-1-48 

n / 2 


2y/re\ 


In 


^ + (n +1) 


- 1 - 


7nn 


-0 


n — 2 + 
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+ ^^ + 32 


('Vhy'T„(2-H?)„7„ + i) 

\ In J e 


Clearly all but the last term above are 0 ( 71 ^^"*"^^). The last terms are )). For any 

/3 > 4, we can pick r large enough so that 4 + < fi and so we have that B^{n) and Bfin) are 

o(n(i+^)). □ 


7 Conclusion 

We have obtained a characterization of the heavy-traffic behavior of the sum queue length in steady- 
state in an n X n switch operating under the MaxWeight scheduling policy when all ports are 
satnrated. We then considered the special case of uniform Bernoulli traffic and studied the switch 
in an asymptotic regime where the load increases simultaneously with the number of ports. We 
showed that the steady-state average queue lengths are within a factor less than 2 of a universal 
lower bound. The result settles one version of a conjecture regarding the performance of the 
MaxWeight policy. A number of extensions can be considered: 

• Extensions of the result to more general traffic patterns when only a few ports are saturated 
is an open problem. 

• We believe that one may be also be able to allow correlations across time slots by making an 
assnmption similar to the assnmption in Section II.C of [23], and considering the drift of the 
Lyapunov fnnction over mnltiple time slots. This extension may require a bit of additional 
work. 

• A Brownian limit has been established in the heavy-traffic regime in [ 6 ], but a characterization 
of the behavior of this limit in steady-state is not known. We expect the mean of the sum 
queue lengths (multiplied by e and in the limit e ^ 0 ) in steady-state that we have derived 
to be equal to the sum of the steady-state expectations of the components of the Brownian 
motion in [ 6 ]. This would be interesting to verify. 

• Verifying whether the MaxWeight algorithm achieves the optimal qneue-length scaling in the 
size of the switch in non-heavy-trafRc regimes is still an open problem. 
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A Proof of Lemma [3] 

Proof. Lemma [2] is applicable here and so we have that E[Z (Y)] < oo. Recall that AZ(X) is a 

random variable for any X, so define 

H = sup esssup|AZ(Y)| = sup |Z(Y') — Z(Y)| 

xex x,x'ex,¥{x{t+i)=x'\x{t)=x)>0 

Also define 


Pra.. = sup P(Y(t + 1) > X\X{t) = X) 

x&x 


Then, from Theorem 1 in m, we have 


¥ (Z (Y) > k + 2Dm] < 


Dpn 


m+1 


, -Dpmax + 0 , 


Clearly, D < D and Pmax < 1- Therefore, we get 

P [Z (Y) > k + 2Dm) <P (z (Y) > k + 2Dm 


< 
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E>Pr, 


m-l-1 


Tlpmax + 0 , 
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D + r] 


m+1 
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where the last inequality follows from -Dpmax < D and m + 1 > 1. This proves the hrst part of the 
lemma. We will now use this result to obtain moment bounds. Since r > 0 and Z{.) > 0, we have 


ElZ (X)''] =r / (Z (X) > t) dt 
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where (a) follows from (a + bp < 2^ max(a, 6)^ < 2^{a^ + b^). It is known [21] that for x < 1 
and r = 1,2,... ~ (i_i)r-+i X]fc=o where A{r,k) are called the Eulerian 

numbers. It is also known that A{r, k) = r!. Therefore, when x < 1, we have — 

(i_i)T-+i Using this relation, we get 


E[Z (X)"] < {2kY + {‘iDY 



□ 


B Proof of Lemma [5] 

Proof. We will use Lemma [2] to hrst show that E[U(q)] is hnite. Dehne the Lyapunov function 
— IItII = \/U(q), and its drift 

AW(q) ^[IU(q(t + 1)) - W(q(t))] X(q(t) = q) 
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We will first verify condition 1C.21 of Lemma [2j Using the same arguments as in ([8|), we get 


|AW(q)|=|l|q(t + i)ll-llq(i)ll|:i(q(i) = q) 

<||q(t + 1) -q(t)||X(q(t) = q) 

<n^ max \qij{t + 1) - %(t)| 2:(q(t) = q) 
L 

<71 Umax; 

thus verifying condition IC.21 We will now verify condition 1C.II 


E[AW(q)|q(t) 


q] =E[||q(t + 1)11 - ||q(t)||| q(t) = q 
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<E 


vHtTW- VHW q(t) = q 
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E[AU(q)| q(t) = q] 


q(t) = q 
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2||q 

(b) 1 

- oImT + 71 + 2 (q, A - E [s(t)| q(t) = q])) 

(c) 1 
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2||q|| 


l'^lr + ll<^lr+^ + 2 min (q, A — r) 

rec 


id) 1 
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A|p + ||<t|P + 77 + 2(q, A — (A + eil))) 

q||i 


2||q 

lAlP + ||<t|P + n 


2||q|| 
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|q|| 


(e) IIAIP + llcrp + n 

- 2j|qi| 

11 ^ 112 II 112 

< — — for all q such that W (q) > - — - 

2 ei 


where <j denotes the variance vector and ||q||i — qtj denotes the li norm of q. Inequality (a) 
follows from the concavity of square root function, due to which we have that y/y — y/x < 2 ^( 2 /“^)- 
Inequality (b) follows from the bound on drift of U(.) obtained in ([9]) in the proof of the proof of 

Proposition [21 (c) follows from the fact that we use Max Weight scheduling. Since A G int{C), there 

exists a ei > 0 such that A + eil G C. This gives (d). For any vector x, its ii norm is at least its 

£2 norm , i.e., ||x||i > ||x||. This gives inequality (e). Thus, condition 1C. II is verihed and we have 

that all moments of IU(q) exist in steady state. In particular, we have that E[U(q)] is finite. 

Now, note that 


^3(q) 




4 2^4 

= n max^j - < n 
ij ■> 


^ niv{q). 


Thus, E[V 3 (q)] is also finite. The lemma follows by noting that Ui(q) < V 3 (q) and V 2 (q) < 

U3(q). □ 
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